International Journal of Trend in Scientific Research and Development (IJTSRD) 
Volume 5 Issue 4, May-June 2021 Available Online: www.ijtsrd.com e-ISSN: 2456 - 6470 


Fashion AI Literature 
Ashish Jobson}, Dr. Kamalraj R24 


1MCA, 2Associate Professor, 
1,2Department School of CS and IT, Jain University, Bangalore, Karnataka, India 


ABSTRACT 


We concentrate on the task of Fashion AI, which entails creating images that 
are multimodal in terms of semantics. Previous research has attempted to use 
several class-specific generators, which limits its application to datasets witha 
limited number of classes. Instead, we suggest a new Group Decreasing 
Network (GroupDNet), which takes advantage in the generator of group 
convolutions & gradually reduces the percentages of the groups decoder's 
convolutions. As a result, GroupDNet has a lot of influence over converting 
semantic labels to natural images and can produce plausible high-quality 
results for datasets with a lot of groups. Experiments on a variety of difficult 
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datasets show that GroupDNet outperforms other algorithms in the SMIS ToT; URL: 


mission. We also demonstrate that GroupDNet can perform a variety of 


interesting synthesis tasks. 
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I. INTRODUCTION 

Fashion AI, which has a wide variety of real-world uses and 
draws a lot of interest because it converts semantic marks to 
natural images. In recent years, convolutional neural 
networks have been used to effectively complete object 
detection, object recognition, image segmentation, and 
texture synthesis. By itself, it's a one-to-many mapping 
challenge. A single semantic symbol may be associated with 
a wide number of different natural images. Using a 
variational auto-encoder, inserting noise during preparation, 
creating several sub networks, and using instance-level 
feature embeddings, among other methods, have been used 
in previous studies. While these approaches have made 
considerable strides in terms of image quality and execution, 
we take it a step further by working on a complex multi- 
modal image synthesis task that allows us to have greater 
control over the performance. While this has resulted in a 
significant improvement over the previous generation of 
features, these networks are learned in a fully supervised 
manner on vast volumes of data, requiring expensive and 
time-consuming annotation. Features learned on one dataset 
can be applied to another, but not all datasets are created 
equal, so features learned on Image Net will not perform as 
well on data from other datasets. Under an increasing 
number of classes, however, this type of approach quickly 
degrades in efficiency, increases training time linearly, and 
consumes computational resources. 


II. METHODOLOGY 
We provide comprehensive network designs for each dataset 
in this section. The discriminator's architecture is shown. 


1. The discriminator's design remains consistent across 
datasets. 
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2. Forvarious datasets, the encoder architecture is shown. 

Depicts the Deep Fashion decoder architectures. 

4. The design of the decoder for the ADE20K is seen. Since 
ADE20K has so many groups, we reduce the number of 
channels in each category to avoid excessive GPU 
consumption. 


os 


The total network capacity decreases in this situation, which 
we believe is not beneficial to the performance. Therefore, 
we add a few more convolutional layers to the mix, 
increasing network capacity; as a result, the ADE20K 
decoder varies from the other two datasets in terms of 
construction. 


Training details 

All experiments are educated for 100 epochs on Deep 

Fashion, with learning speeds for both the generator and 

discriminator remaining constant for the first 60 epochs 

before linearly decaying to zero in the last 40. 

> The batch size for Deep Fashion is 32, while the batch 
size for ADE20K is 16. This is due to the vast number of 
channels needed to meet the specifications of sufficient 
capacity for the 150 classes. 

> Glorot initialization is used to set the network weights 
after SPADE. 


Strategy for choosing a group number 

It's difficult to come up with an algorithmic approach to 
calculate the diminishing numbers when GPU memory 
Capacity, batch size, and the number of variables is all 
restricted. To create the group numbers, however, we 
followed two rules: 
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1. In the first few layers of the decoder, the numbers 
decrease dramatically, lowering the computational cost 
significantly; 

2. Theearlierlayer's group number is either the same as or 
twice as the next layers. 
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Figure 1: Architecture of our generator method. 
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Figure 2: Functional Architecture 


IV. RESULTS AND DISCUSSION 

A. Segmentation Performance 

If the produced images appear realistic, it is fair to assume 
that the expected labels are very close in compared to the 
traditional pictures. As a result, we use the assessment 
procedure used in previous work to evaluate the 
segmentation accuracy of the images generated Without 
considering the classes that can be overlooked, the mean 
Intersection Over-Union (mloU) and pixel accuracy (Acc) 
metrics are registered. For DeepFashion, images are 
evaluated using well-trained segmentation models from the 
off-the-shelf human parser CIHP. 


B. Interms oflabel-to-image transformation, there isa 
comparison. 

Others, such as MulNet, GroupNet, GroupEnc, and 

GroupDNet, are clearly capable of producing semantically 

multi-modal images based on qualitative results. MulNet, 

GroupNet, BicycleGAN, and DSCGAN, on the other hand, have 


low image quality due to their implausible images. The 
image quality of GroupEnc is better, but it suffers in the SMIS 
mission. When the upper clothes are changed to a different 
design, the color of the short denim pants is also slightly 
changed by GroupEnc, as seen in the first two rows. We test 
their performance whether they have well-trained models 
available for download from their official GitHub 
repositories. For those experiments not included in their 
original reports, we follow their codes and run the 
experiments with the same settings as GroupDNet. On the 
Deep Fashion and Cityscapes datasets, our network behaves 
similarly to SPADE because it is based on SPADE. Although 
our system performs worse than SPADE on the ADE20K 
dataset, it still outperforms other methods. 


This phenomenon shows the SPADE architecture's 
superiority while also exposing GroupDNet's inability to 
accommodate datasets with a vast number of semantic 
classes. In general, GroupDNet's images are more realistic 
and believable than those produced by others. These visual 
results consistently demonstrate GroupDNet's produced 
images’ high image quality, demonstrating its effectiveness 
on a variety of datasets. 





Figure 2: OutPut 


V. CONCLUSION 

Unlike other possible solutions such as multiple generators, 
our network uses all category convolution layer and adjusts 
the factor analysis of the convolution layers in the decoder to 
improve learning happens. About the fact that GroupDNet 
performs well on semantically multi-modal synthesis tasks 
and delivers relatively high-quality results, there are several 
problems that need to be addressed. To begin with, despite 
being nearly twice as fast as multiple generators networks, it 
takes more computing energy to practise than pix2pixHD 
and SPADE. Second, despite showing some minor differences 
in illumination, colour, and texture, for datasets with 
minimal variety, GroupDNet also fails to model various 
layouts of a single semantic type. 
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